COMPTIA DATA+ Exam DA0-001 Questions V8.02 CompTIA Data+ Topics - CompTIA Data+ Certification Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials 1.Q3 2020 has just ended, and now a data analyst needs to create an ad-hoc sales report that demonstrates how well the Q3 2020 promotion went versus last year's Q3 promotion. Which of the following date parameters should the analyst use? A. 2019 vs. YTD 2020 B. Q3 2019 vs. Q3 2020 C. YTD 2019 vs. YTD 2020 D. Q4 2019 vs. Q3 2020 Answer: B 2.A data analyst has been asked to create an ad-hoc sales report for the Chief Executive Officer (CEO). Which of the following should be included in the report? A. The sales representatives' home addresses. B. Line-item SKU numbers. C. YTD total sales. D. The customers' first and last names. Answer: C 3.Which of the following can be used to translate data into another form so it can only be read by a user who has a key or a password? A. Data encryption. B. Data transmission. C. Data protection. D. Data masking. Answer: A Explanation: A. Data encryption. Data encryption is a way of translating data from plaintext (unencrypted) to ciphertext (encrypted). Users can access encrypted data with an encryption key and decrypted data with a decryption key. 4.Which of the following is an example of a discrete data type? A. 8in (20cm) B. 5 kids C. 2.5mi (4km) D. 10.7lbs (4.9kg) Answer: B Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials 5.Which of the following contains alphanumeric values? A. 10.1Ε² B. 13.6 C. 1347 D. A3J7 Answer: D 6.A junior web developer is developing a new application where users can upload short videos. The first task is to create a homepage that shows the headline "Upload Your Short Videos" and a clickable button that says "upload now". Which of the following HTML commands would help the developer to complete the task successfully? A. < span >Upload Your Short Videos< /span >< button >upload now< /button > B. < p >Upload Your Short Videos< /p >< p >upload now< /p > C. < hl >Upload Your Short Videos< /h1 >< button >upload now< /button > D. < hl >Upload Your Short Videos< /h1 >< hl >upload now< /h1 > Answer: C Explanation: The correct answer is: Upload Your Short Videos upload now The two tags are used to define HTML headings. defines the most important heading. defines the least important heading. Note: Only use one per page - this should represent the main heading/subject for the whole page. The tag defines a clickable button. 7.A web developer wants to ensure that malicious users can't type SQL statements when they asked for input, like their username/userid. Which of the following query optimization techniques would effectively prevent SQL Injection attacks? A. Indexing. B. Subset of records. C. Temporary table in the query set. D. Parametrization. Answer: D Explanation: The correct answer is D: Parametrization. Parameterized SQL queries allow you to place parameters in an SQL query instead of a constant value. A parameter takes a value only when the query is executed, allowing the query to be reused with different values and purposes. Parameterized SQL statements are available in some analysis clients, and are also available through the Historian SDK. For example, you could create the following conditional SQL query, which contains a Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials parameter for the collector's name: SELECT* FROM ExamsDigest WHERE coursename=? ORDER BY tagname SQL Injection is best prevented through the use of parameterized queries. 8.Consider the following dataset which contains information about houses that are for sale: Which of the following string manipulation commands will combine the address and region name columns to create a full address? full_address------------------------- 85 Turner St, Northern Metropolitan 25 Bloomburg St, Northern Metropolitan 5 Charles St, Northern Metropolitan 40 Federation La, Northern Metropolitan 55a Park St, Northern Metropolitan A. SELECT CONCAT(address, ' , ' , regionname) AS full_address FROM melb LIMIT 5; B. SELECT CONCAT(address, '-' , regionname) AS full_address FROM melb LIMIT 5; C. SELECT CONCAT(regionname, ' , ' , address) AS full_address FROM melb LIMIT 5 D. SELECT CONCAT(regionname, '-' , address) AS full_address FROM melb LIMIT 5; Answer: A Explanation: The correct answer is A: SELECT CONCAT(address, ' , ' , regionname) AS full_address FROM melb LIMIT 5; String manipulation (or string handling) is the process of changing, parsing, splicing, pasting, or analyzing strings. SQL is used for managing data in a relational database. The CONCAT() function adds two or more strings together. Syntax CONCAT(stringl, string2,... string_n) Parameter Values Parameter Description stringl, string2, string_n Required. The strings to add together. 9.The ACME Corporation hired an analyst to detect data quality issues in their excel documents. Which of the following are the most common issues? (Select TWO) A. Apostrophe. B. Commas. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials C. Symbols. D. Duplicates. E. Misspellings. Answer: D, E Explanation: 10. Duplicates 11. Misspellings The most common data quality issues are difficult to resolve in Excel because of their rigidity. It forces analysts to do a ton of manual work, which results in a high probability of an error being introduced to the data set. Those common issues include: - Blanks - Nulls - Outliers - Duplicates - Extra spaces - Misspellings - Abbreviations and domain-specific variations - Formula error codes When introduced, these errors can skew or even invalidate the resulting analysis. A smart tool would minimize the possibility of error by automating the manual work. In Excel, you might look for data quality issues in one of two ways. First, you might use auto filters on specific columns to scan for anomalies and blanks or you might use a pivot table to find gaps and discrepancies. In either case, you're scanning for the anomalies yourself. Suffice it to say that's not a very efficient process. It also means accuracy is only as good as the analyst's eye, so the probability of error varies throughout the day. 12.Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 This tables show a simple frequency distribution of the retirement age data. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials A. 56 B. 55 C. 57 D. 54 Answer: D Explanation: A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution. There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution. What is the mode? The mode is the most commonly occurring value in a distribution. The most commonly occurring value is 54, therefore the mode of this distribution is 54 years. 13.Which of the following value is the measure of dispersion "range" between the scores of ten students in a test. The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77. A. 90 B. 60 C. 70 D. 80 Answer: B Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Explanation: The correct answer is: 60 Range is the interval between the highest and the lowest score. Range is a measure of variability or scatteredness of the varieties or observations among themselves and does not give an idea about the spread of the observations around some central value. Symbolically R = Hs - Ls. Where R = Range; Hs is the 'Highest score' and Ls is the Lowest Score. The scores of ten students in a test are: 17, 23, 30, 36, 45, 51, 58, 66, 72, 77. The highest score is 77 and the lowest score is 17. So the range is the difference between these two scores Range = 77 - 17 = 60 14.A data scientist wants to see which products make the most money and which products attract the most customer purchasing interest in their company. Which of the following data manipulation techniques would he use to obtain this information? A. Data append B. Data blending C. Normalize data D. Data merge Answer: B Explanation: The correct answer is B: Data blending. Data blending is combining multiple data sources to create a single, new dataset, which can be presented visually in a dashboard or other visualization and can then be processed or analyzed. Enterprises get their data from a variety of sources, and users may want to temporarily bring together different datasets to compare data relationships or answer a specific question. Data append is incorrect. Data append is a process that involves adding new data elements to an existing database. An example of a common data append would be the enhancement of a company's customer files. A data append takes the information they have, matches it against a larger database of business data, allowing the desired missing data fields to be added. Normalize data is incorrect. Data normalization is the process of structuring your relational customer database, following a series of normal forms. This improves the accuracy and integrity of your data while ensuring that your database is easier to navigate. Data merge is incorrect. Data merging is the process of combining two or more data sets into a single data set. 15.A data analyst wants to create "Income Categories" that would be calculated based on the existing variable "Income". The "Income Categories" would be as follows: Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Income category 1: less than $1. Income category 2: more than $1 and less than $20,000. Income category 3: more than $20,001 and less than $40,000. Income category 4: more than $40,001. Which of the following data manipulation techniques should the data analyst use to create "Income Categories"? A. Data merge B. Derived variables C. Data blending D. Data append Answer: B Explanation: The correct answer is B: Derived variables Derived variables are variables that you create by calculating or categorizing variables that already exist in your data set. Data merge is incorrect. Data merging is the process of combining two or more data sets into a single data set. Data blending is incorrect. Data blending involves pulling data from different sources and creating a single, unique, dataset for visualization and analysis. Data append is incorrect. A data append is a process that involves adding new data elements to an existing database. 16.Angela is aggregating data from CRM system with data from an employee system. While performing an initial quality check, she realizes that her employee ID is not associated with her identifier in the CRM system. What kind of issues is Angela facing? Choose the best answer. A. ETL process. B. Record linkage. C. ELT process. D. System integration. Answer: B Explanation: While this scenario describes a system integration challenge that can be solved with ETL or ELT, Angela is facing a Record linkage issue. 17.Andy is a pricing analyst for a retailer. Using a hypothesis test, he wants to assess whether people who receive electronic coupons spend more on average. What should Andy's null hypothesis be? A. People who receive electronic coupons spend more on average. B. People who receive electronic coupons spend less on average. C. People who receive electronic coupons do not spend more on average. D. People who do not receive electronic coupons spend more on average. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Answer: C Explanation: The null hypothesis presumes the status quo. Andy is testing whether or not people who receive an electronic coupon spend more on average, so, the null hypothesis states that people who receive the coupon do spend more on average. 18.Amanda needs to create a dashboard that will draw information from many other data sources and present it to business leaders. Which one of the following tools is least likely to meet her needs? A. QuickSight. B. Tableau. C. Power BI. D. SPSS Modeler. Answer: D Explanation: SPSS Modeler. QuickSight, Tableau, and Power BI are all powerful analytics and reporting tools that can pull data from a variety of sources. SPSS Modeler is a powerful predictive analytics platform that is designed to bring predictive intelligence to decisions made by individuals, groups, systems and your enterprise. 19.Daniel is using the structured Query language to work with data stored in relational database. He would like to add several new rows to a database table. What command should he use? A. SELECT. B. ALTER. C. INSERT. D. UPDATE. Answer: C Explanation: INSERT The INSERT command is used to add new records to a database table. The SELECT command is used to retrieve information from a database. It's the most commonly used command in SQL because it is used to pose queries to the database and retrieve the data that you're interested in working with. The UPDATE command is used to modify rows in the database. The CREATE command is used to create a new table within your database or a new database on your server. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials 20.Jhon is working on an ELT process that sources data from six different source systems. Looking at the source data, he finds that data about the sample people exists in two of six systems. What does he have to make sure he checks for in his ELT process? Choose the best answer. A. Duplicate Data. B. Redundant Data. C. Invalid Data. D. Missing Data. Answer: C Explanation: Duplicate Data. While invalid, redundant, or missing data are all valid concerns, data about people exists in two of the six systems. As such, Jhon needs to account for duplicate data issues. 21.Samantha needs to share a list of her organization's top 50 customers with the VP of sales. She would like to include the name of the customer, the business they represent, their contact information, and their total sales over the past year. The VP does not have any specialized analytics skills or software but would like to make some personal notes on the dataset. What would be the best tool for Samantha to use to share this information? A. Power BI. B. Microsoft Excel. C. Minitab. D. SAS. Answer: B Explanation: Microsoft Excel. This scenario presents a very simple use case where the business leader needs a dataset in an easy-to-access form and will not be performing any detailed analysis. A simple spreadsheet, such as Microsoft Excel, would be the best tool for this job. There is no need to use a statistical analysis package, such as SAS or Minitab, as this would likely confuse the VP without adding any value. The same is true of an integrated analytics suite, such as Power BI. 22.Alex wants to use data from his corporate sale, CRM, and shipping systems to try and predict future sales. Which of the following systems is the most appropriate? Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Choose the best answer. A. Data mart. B. OLAP. C. Data Warehouse. D. OLTP. Answer: C Explanation: Correct answer: C. Data Warehouse. Data warehouse bring together data from multiple systems used by an organization. A data mart is too narrow, as Alex needs data from across multiple divisions. OLAP is a broad term of analytical processing, and OLTP systems are transactional and not ideal for this task. 23.Analytics reports should follow corporate style guidelines. A. True. B. False. Answer: A 24.Which one of the following is a measure of dispersion? A. Variance. B. Mode. C. Median. D. Mean. Answer: A 25.Which one of the following in NOT a common data integration tool? A. XSS B. ELT C. ETL D. APIs Answer: A Explanation: Cross-site Scripting (XSS) is a security vulnerability usually found in websites and/or web applications that accept user input. XSS is a client-side vulnerability that targets other application users, while SQL injection is a server-side vulnerability that targets the application's database. How do I prevent XSS in PHP? Filter your inputs with a whitelist of allowed characters and use type hints or type casting. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials 26.Which one of the following is a common data warehouse schema? A. Snowflake. B. Square. C. Spiral. D. Sphere. Answer: A Explanation: Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. The Snowflake data platform is not built on any existing database technology or “big data” software platforms such as Hadoop. 27.What would be an example of an acceptable form of primary identification for the Data+ exam? A. Passport. B. School ID card. C. Employee ID card. D. Credit card with photo and signature. Answer: A 28.You are working with a professional statistician to perform an analysis and would like to use a statistics package. Which one of the following would be the most appropriate? A. Rapid Miner. B. QLIK. C. Power BI. D. Minitab. Answer: D Explanation: Minitab is statistical analysis software. It can be used for learning about statistics as well as statistical research. Statistical analysis computer applications have the advantage of being accurate, reliable, and generally faster than computing statistics and drawing graphs by hand. 29.What SQL command is used to delete an entire table from a database? A. DROP. B. MODIFY. C. DELETE. D. ALTER. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Answer: A 30.Which one of the following programming languages is specifically designed for use in analytics applications? A. Python. B. R C. C++ D. Java. Answer: B 31.What role in a data governance is typically responsible for day-to-day oversight of data use? A. Data processors. B. Data custodians C. Data owners. D. Data stewards. Answer: D 32.What category of data stewardship work is focused on ensuring that the organization respects the wishes of data subjects? A. Data quality. B. Data privacy. C. Data security. D. Regulatory compliance. Answer: B Explanation: Data privacy defines who has access to data, while data protection provides tools and policies to actually restrict access to the data. Compliance regulations help ensure that user's privacy requests are carried out by companies, and companies are responsible to take measures to protect private user data. Why is data privacy important? When data that should be kept private gets in the wrong hands, bad things can happen. A data breach at a government agency can, for example, put top secret information in the hands of an enemy state. A breach at a corporation can put proprietary data in the hands of a competitor. 33.You are working with a dataset and need to swap the values in rows with those in columns. What action do you need to perform? Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials A. Recording B. Filtering. C. Aggregation. D. Transposition. Answer: D Explanation: Transpose creates a new data file in which the rows and columns in the original data file are transposed so that cases (rows) become variables and variables (columns) become cases. Transpose automatically creates new variable names and displays a list of the new variable names. Transposing data is useful for data analysis. At times, we have to pull data from various files with different formats for analysis and preparing reports. In such circumstances, we may have to transpose some data from one file to the other. In excel, we can transpose data in multiple ways. 34.When analyzing the values of two variables, you decide to convert both variables so they are on a scale of 0 to 1. What term describes this action? A. Filtering. B. Normalization. C. Transposition. D. Aggregation. Answer: B Explanation: Normalization is the process of reorganizing data in a database so that it meets two basic requirements: There is no redundancy of data, all data is stored in only one place. Data dependencies are logical, all related data items are stored together. Put simply, data normalization ensures that your data looks, reads, and can be utilized the same way across all of the records in your customer database. This is done by standardizing the formats of specific fields and records within your customer database. 35.Taylor wants to investigate how manufacturing, marketing, and sales expenditures impact overall profitability for her company. Which of the following systems is the most appropriate? A. OLTP. B. OLAP. C. Data warehouse. D. Data mart. Answer: C Explanation: Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials A Data mart is too narrow, because Taylor needs data from across multiple divisions. OLAP is a broad term for analytical processing, and OLTP systems are transactional and not ideal for the task. Since Taylor is working with data across multiple different divisions, she will work with a Data warehouse. 36.Emma is working in a data warehouse and finds a finance fact table links to an organization dimension, which in turn links to a currency dimension that not linked to the fact table. What type of design pattern is the data warehouse using? A. Star. B. Sun. C. Snowflake. D. Comet. Answer: C Explanation: Correct answer C. Snowflake. Since the dimension links to a dimension that isn't connected to the fact table, it must be a Snowflake, with a Star, all dimensions link directly to the fact table, Sun and Comet are not data warehouse design patterns. 37.Encryption is a mechanism for protecting data. When should encryption be applied to data? Choose the best answer. A. When data is at rest. B. When data is at rest or in transit. C. When data is in transit. D. When data is at rest, unless you are using local storage. Answer: B Explanation: Correct answer B. When data is at rest or in transit. To provide maximum protection, encrypt data both in transit and at rest. 38.What subset of Structured Query Language (SQL) is used to add, remove, modify, or retrieve the information stored within a relational database? A. DDL. B. DSL. C. DQL. D. DML. Answer: D Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Explanation: Correct answer D. DML. The Data Manipulation Language (DML) is used to work with the data stored in a database. DML includes the SELECT, INSERT, UPDATE, and DELETE commands. The Data Definition Language (DDL) contains the commands used to create and structure a relational database. It includes the CREATE, ALTER, and DROP commands. DDL and DML are the only two sublanguages of SQL. 39.Which of the following roles is responsible for ensuring an organization's data quality, security, privacy, and regulatory compliance? A. Data owner. B. Data steward. C. Data custodian. D. Data processor. Answer: B Explanation: Correct answer B. Data steward. A data steward is responsible for leading an organization's data governance activities, which include data quality, security, privacy, and regulatory compliance. 40.Jenny wants to study the academic performance of undergraduate sophomores and wants to determine the average grade point average at different points during an academic year. What best describes the data set she needs? A. Sample. B. Observation. C. Variable. D. Population. Answer: A Explanation: A. Sample. Jenny does not have data for the entire population of all undergraduate sophomores. While a specific grade point average is an observation of variable, jenny needs sample data. 41.Mario works with a group of R programmers tasked with copying data from an accounting system into a data warehouse. Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials In what phase are the group's R skills most relevant? A. Extract. B. Load. C. Transform. D. Purge. Answer: C Explanation: C. Transform The R programming language is used to manipulate and model data. In the ETL process, this activity normally takes place during the Transform phase. The Extract and Load phases typically use database-centric tools. Purging data from database is typically done using SQL. 42.Which one of the following tools would not be considered a fully featured analytics suite? A. Minitab. B. MicroStrategy. C. Domo. D. Power BI. Answer: A Explanation: A. Minitab. Power BI, Domo, and MicroStrategy are all analytics suites offering features that fill many different needs within the analytics process. Minitab is a statistical analysis package that lacks many of these capabilities. 43.Andrew conducts a study and wants to capture eye color. What kind of data is eye color? Choose the best response. A. Discrete. B. Categorical. C. Continuous. D. Alphanumeric. Answer: B Explanation: B. Categorical. Eye color can only fall into a certain range of values; as such, it is categorical. 44.Harry is looking at home sales prices in single zip code and notices that one home sold for $940,394 when the average selling price of similar homes is $210,420. What type of data does the $940,394 sales price represent? Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Choose the best answer. A. Duplicate data. B. Data outlier. C. Redundant data. D. Invalid data. Answer: B Explanation: B. Data outlier. Since the value is more than four times the average, the $940,394 value is an outlier. 45.Melissa wants to explore central tendency in her dataset. Which statistic best matches her need? A. Interquartile range. B. Range. C. Median. D. Standard deviation. Answer: C Explanation: C. Median. The Median is the middle observation of variable and is, therefore, a measure of central tendency. Interquartile range is a measure of position. Range and Standard deviation are both measures of dispersion. 46.Olivia has 15 people on her data analytics team. Her team's charter requires that all team members have read access to the finance, human resources, sales, and customer service areas of the corporate data warehouse. What is the best way to provision access to her team? Choose the best answer. A. Since there are 15 people on her team, create a role for each person to improve security. B. Since there are four discrete data subjects, create one role for each subject area. C. Enable multifactor authentication (MFA) to protect the data. D. Create a single role that includes finance, human resources, sales, and customer services data. Answer: D Explanation: D. Create a single role that includes finance, human resources, sales, and customer services data. While MFA is a good security practice, it doesn't govern access to data. Creating a single role for her team and assigning that role to the individuals on the Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials team is the best approach. 47.What is the median of the following numbers? 13, 2, 65, 3, 5, 4, 7, 3, 4, 7, 8, 2, 4, 4, 60, 23, 43, 2 A. 4 B. 4.5 C. 63 D. 18 Answer: B Explanation: B. 4.5 To find the median, sort the numbers in your dataset and find the one located in the middle. In this case, there are an even number of observations, so we take the two middle numbers (4 and 5) and use their average as the median, making the median value 4.5. The mode is 4, the range is 63, and the number of observations is 18. 48.Oliver is designing an ETL process to copy sales data into a data warehouse on a hourly basis. What approach should Oliver choose that would be most efficient and minimize the chance of losing historical data? A. Bulk load. B. Purge and load. C. Use ELT instead of ETL. D. Delta load. Answer: D Explanation: D. Delta load Since Oliver needs to migrate changes every hour, a delta load is the best approach. 49.James wants to analyze profit based on sales of five different product categories. His source data set consists of 5.8 million rows with columns including region, product category, product name, and sales price. How should he manipulate the data to facilitate his analysis? Choose the best answer. A. Transpose by region and summarize. B. Transpose by product category and summarize. C. Transpose by product name and summarize. D. Transpose by sales price and summarize. Answer: B Free DA0-001 Demo Questions [2022] Check Quality OF DA0-001 Materials Explanation: B. Transpose by product category and summarize. We can transpose this data by product category to perform this analysis broken out by product category. Transposing by sales price, region, or product name will not further his state analytical goal. 50.According to the empirical rule, what percent of thee values in a sample fall within three standard deviations of the mean in a normal distribution? A. 99.70% B. 95% C. 90% D. 68% Answer: A Explanation: A. 99.70% According to the empirical rule, 68% of values are within one standard deviation, 95% are within two standard deviations, and 99.7% are within three standard deviations. 51.Chris is building a database to store prices for items on a restaurant menu. What data type is most appropriate for this field? A. Numeric. B. Date. C. Tags. D. Alphanumeric. Answer: A Explanation: A. Numeric Prices are numbers stored in dollars and cents; as such, the data type needs to be capable of storing numbers. 52.George is conducting a survey. He intends to distribute the survey via email and wants to optionally follow up with respondents based on their answers. What quality dimension is most vital to the success of George's survey? Choose the best answer. A. Completeness. B. Accuracy. C. Consistency. D. Validity. Answer: A